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.he present .avenfon ,s related 
eo„c„rrlherew.ha„aHere.,.ncorpo— ^ 

Ser>\)No. 09/__ p„mp,io» During Speculatrve 

■apparatus Wod for ControltagUnR sue. Cormp 

instruction Brlhtag;- and ^^^^^^ auS990815US1) 

Serial No\09/___ , Memorv Device During Speculative 

entitled -ApparaVMe^od for AccessmgaMe^orv 

Instruction Branching 

TECHNICAL FIELD ^^^^^^ p.p^,.„i„g ,„ p„cessing 

The present rnvenuon relate g 

-'-^•-""r:t::i:--andfetc.W.ranc...^^^ 
predictions by selectively accessing 

BACKGROL^— ™N ^^^^^ 

Modem hrgh frequency nucropro ^^,bi„es, instructions are 

— — ^^^^^^^^ 
fetched and executed speculatively, mot 



-1- 



PATENT 

AT9-98-544 



10 



15 



20 



d,ec,lo„ and .arge, of *e branch instruoUons betag fetched many cycles befor *e 
branch acn,any geU exerted. When .he branch is executed, .he dhec..on and 
.arge. of U,e branch are deremrined. i. is de.ennined Ura. U-e branch has been 
„ispredic.ed (ei.her i.s .arge. or ..s drreCion), to fte branch ins»rc.,on ,s 
eon,p,e.ed and a>, success.ve insuuCons are flushed ou. of *e >ns.rua.o„ p^e,,r,e 
ana new ,ns.,uc.ions .e fe,ched eid,er fron, *e se,uen.ial pa* of .he br»c (.f *e 
branch rsresoived as „o..a.en)orfron,.he.arge.pa«.ofU,ebranchOf rebranch ,s 

resolved as taken). 

Many branch prediction .echni,ues have been developed. For example, .n 
Bimodal Branch Prediction, selectedbits fn,n, abranch address areused as apornter 
.0 abranch histor, .able. The entties .0 Ms .able n,a,n.ain a his«>ry as .o Ore number 
of times a corresponding branch instruction is actually t^en during insmrcuon 
exec„.io„ versus ,he number of ..mes « branch inst™cion is not taken. From h. 
history a pred.ction can be made as to fte probab,li.y tot an inst^Con wU e .aken 
auring'instruction execution and therefore whether a se<,uentral inst™cticn ol ow.ng 
,hat branch insUuCon or «,e .arge. of .he branch instruction should be fetched mto 

the instruction pipeline. ^ 

aher branch prediction techniques include Gobal Branch Pred,ct,on. Global 
Predtctor with Index Selection (Gselea) and Global History w,ft todex Shar,ng 
(Gshare). Often. execu.io„ of abranch insUucUon will depend on execuUon of a 
previous ins.a.cUon. Therefore, ,n these schemes, a global history sh,ft register .s 
used which stores history bits for a gwen number of previously executed branch 
instructions. The history bits tag if the corresponding branch was taken or no. uken. 
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u ..A tn address a global branch history table 

,oMi„. predicts ^^^.^ ^^Z^. blh i„sU.o.lc„ .0 address .he g,oba, 
combined wiA selected b>B from the current b tf,e register 

. Ki T„ r^hare an XOR operation is performed using ine b 

*"*thof.hee.s.ngschemesissub,ecttodisadvan.ages,either,„pre^^^^^^^ 
.of hardware (eg *e number of arrays required for the tables), or 
— hX— :ormoree..,cientme.odscircu.a^^ 

10 performing branch prediction. 
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SUMMARY OF THE INVENTION 

According .0 the principles of *e present inventior,, branch pred.Con 
circuury is disclosed which included a bimodal branch hisu,ry table, a fetch-based 
branch Ustory »ble and a selector table ts provtded. The bimodal branch history 
uble .noludes a plurality of entries each for stortng a prediction value and accesse^ y 
.elected bits of a branch address. The fetch-based br^ch history table has a plurahty 
of entries for storing a prediction value and accessed by a pointer generated from 
selection btts of the branch address and bits from a history regtster. The selector table 
contprrses a plutality of entiles each for storing a selection bit and accessed by a 
pointer generated from selected btts from the br^ch address and b.ts from the htstory 
register, each selector b,t is used for selecting between a prediction value accessed 
from ti,e btmodal history table and a prediction value accessed from tire fe^h-based 

history table. . . 

In the preferred embodiments of the present inventive pnnciples. crcu.tiy 
systems and methods combine bimodal branch prediction with fetch-based branch 
prediction using only three tables, ta addition to minimaing circuitry, the present 
inventive concepts do not require extensive and/or complicated schemes for aocesstng 
these tables, as was required in the prior art. 

The foregoing has outlined rather broadly the features and technical 
advantages of the present invention in order that the detailed descnption of the 
invention that follows may be betier understood. Addition^ feati,res and advantages 
of the invention w,ll be described hereinafter whtch form dte subject of the cla,ms of 
the invention. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

F„ a more complete u,ders«dmg of U,e present iave„.io„, and the 
„esthereof,reference.nowmade.„t.efonowlnsdescnpt,ons.a.e„,n 

conjunction with the accompanying drawings, ,n whtch: 

nOURE 1 isahrghlevelfnnctionalblocMiagramof areprcsenta«vedata 

processing system suitable for pracUcng the pnnciples of the present inventton. 
' PIGlL 2A rs a high level ftmctiona . bloc, diagran, of selected operafona, 

""roi^'Btnustratesmnntherdetailaporaonoftheselectedoperatronal 

"""talnghlevelfunct,onalhlocM.a^ofahranchpred,ct.„n 

circuitry embodying the inventive principles; 

nOURE 3B illustrates in firrther de..l a portion of the branch predtcuon 

""™~.es,npart,.schematicform another po^onofthehranch 

^'"i^ir^Lha^------ 

and fetch-based branch hrstory tables of F.OmE 3A in accordance w,fl, the 

nrinrinlesofthe present invention; . , , 

' F,GmE6,sVwd,agramin„stratingafirstmethod 

% fetch-based branch\tory and selector tables of FIGURE 3A, 
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P,OURe\. a flow diagram iUusUa.ng a secoad method of updaU g 

noi^S S>a„d SB in..a., ,n flowCan f„™, a ™e,hodo,ogy for 
.„J:Xa,K,s».ve«o.O„V).a— w.a„e— of.e 
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DETAILED DESCRIPTION 

— — — spe— 

.hoseskiUedinU-eartarecapableofpracncng " 
.-ae*. -..,..0. w.^^^^^^^^ 

. • o «nlv those signalhnes and 
• tinn It Will be recognized that, in the drawings, only those sig 
invention. ItwillDer g „f the oresent invention are shown, 

processorblocks necessary forthe operation ofthe present 

Referring to the drawings, depicted elements are not necessarily 
,..,,:lo!siniilareienientsaredesignatedhythesa^ 

• «t.n. 100 suitable for practicing the principles of the 

„,t i—- -"''-'"";" .„.,„ ^ 

'°CPU .0 opiates . co.un*n ,eaa.„>y ^e^o. (.OM) >6 a„a ra.ao™ 
_L.iM)H,A.on.o.e...s,.OM..uppo..He.as,c,»p. 
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1. DRAM (Dynamic Random 

^,^s Memory) systen,me™^ and SRAM C 

ex.em.1 cache, ,„ ,„,erconnec«on beween .he devices on system 

„0 Adap»' « 7'"" aeve. (e g, a hard drive, 

,„s«andex.erna,peripheraU,suchas^J ^^^^^^^^^^^^ 

floppy drive or CD/ROM drive), or a p ^ ^^^^^^ „.y 

coupled » a penpheralcomrolmrerface (PC) 

for example PCI bus bridge, ^^^.^ 3^,, , keyboard 

User,n,e.aeeadap.er..co^>.van°^^ 
,4,mouse2Mo*ad33orsp..e » P^^^^^_^^^^^^^^^^^^^^^ 

Display adapter 36 support^ .P ,^ 

^a,-(Cm...u-.->;2^^^^^ 
D.spla,adapter36 may mclude among 

and frame buffer men,ory- to a computer o, telecommunications 

SystemlOOcanbeselect-velyc^pM ^^^^^^^^^^^ 

network through commumcatrons adapte^ ^ „«,ons network 

,oWdeforexamp,e,amodemforcon«^-» 

»a,or hardware andsoftwareforconne^ngto 

,,ea network (LAN) or wide area „f .elected operational 

HGURE2isahighlevelfunct,onalblo* g 

,,„.ksw,th,nCPU10...eil,us„--^^^^^^ 

..,„etioncache(.-cache)40andj^.^ ^^^^^^^ 
„„ughbus,2andbusm,erfaceumt44a 
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dauremevedfrom 50 or floatrng po.« « ^,^^0^^^ 

•i fields 58a-58e ^jj^ f,eld 58c 

ri ofbUsf™.a^»*'»*^;J„,,,,ed.«o„va,ueandf>W^^^^^ 

^^^»»^""C:Jea---^^'*"::;T30»„dase.o..* 

preselected numW.". discussion. ^ « coun» (e»trv) 

^•^r^ a counter (,enuy; 
this configuration, a CO 
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. nb.t It Should be noted however, that m alternate 

concepts obtaining branch predictions 

u u- .^r,, table 301 s accessed tor oDiaimug 
Local branch history table 

i„,„„c,ion fe«h address «g,s.er (IFAR) ^^^^^ ^.^^^^ 

pointer wiU »e denoted -.bhueaa^*. ^ ^^^^^^ 
,ecessedforobuin.sbrancHpred,cUon^-f*™^^^^ ^^^^^^^ 
fron,.hecu„enUache„ne address are.^..seXa.2.^ „ 

O„Vre..30e.— eo...^^^^^^^^^^^ 
process for accessing the history tables XV 

history table, gbhuead.addr. Theacc» ^^i„i„„ values output 

„.edbvselee..onlo,e30Stoselecte......^^^^^^^^^ 

frotn LBHT 30, or the fetch-based bran b pred.» ^ 

p multiplexers (MUXs), which ou p ^ 

u^r ^ nf the orediction values may be irom 
3,0.Notetha.anunrber,of.hep i„s.nrctions are fetched from 

remaining number^ may be from GB ^ ^ ^^^^^ 

memory, indudtngmterna, memory, such as 1-cache 4. 
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CPU ,0 Thus, the number of predicaons >„ an entry accon^nrodates all of the 
Ilt.on.th.arefetchedina.ns,ecyc,e,wh>ch,„ayhereferreatoasafetch 

:rpTG)Thenu.her,.,of-,nstruct,„„.,nafetchsroupn.a.hee*.na„ 
:::::J„tof*epre.n.,n.en.>on.In.the„,ustrateae.n.^^^^^^^^^ 

.1,1 (fSPn 303 selects the output from LBH 1 JU i wniic 
accessed from selector table (OSEL) 303 selec i03 tracks the 

Logrc 1 selects the output from GBHT 302. Generally, selector table 303 „ack t 

! . of the local and fetch-based branch history tables for a grven branch 

::t::-.-ha™..ebe.erpred,ct.onh,s.^^ 

branch instruction is then used to perform the current branch predrctron 

"™ ;heOHV.o™wh.ch.e.bht_read .drrsseneratedasdescn ed^^^^^ 

„..hehis..ofbranch,ns.«ions — 

branches are executed and resolved, the CtHV is up , . ^ , •„ 

branches ar „ , • , ,nfi GHV logic 3 1 1 is described in detail in 

GHV and loads it into GHV register 306. GHV logic 

^^^^^ . • TRHT 301 GBHT 302 and GSEL 303 must also 

Additionally, the entries in LBHT 301, Gtin 

to the execution of branch instructions, The entries are 
be updated in response to the execution • . ^HT 301 GBHT 302 

A t.H bv oroviding information to the appropriate entry in LBHT 301, Gb 
updated by proviaing in 

and GSEL 303 for setting or resetting, as appropriate, thep 
lspondingentry,dependingonthepredictionandtheresolution,oractua 

of the brlch The information sent to LBHT 301 may be referred to as 
outcome, of the branch, in ^^^^^ 
"Ibht write data", theupdateinformationprovidedtoGBHT302may 

^ .^te data",andtheupdateinformationforGSEL303 maybereferredto 

PiKKt write data abht write data and 
"gsel_write_data." The values of lbht_write_data, go _ _ 
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gsel write_data are generated by counter logic 3 12 and loaded, respectively, into 
LBHT write data register 3 1 4, GBHT write data register 3 1 6 and GSEL write data 
register 318. Counter logic 3 12 generates the values of lbht_write_data, 
gbht wnte data and gsel_wnte_data in response to an actual branch direction 
determined when the corresponding branch instmction executes, and the predictions, 
from BIQ field 58b in the entry 57 corresponding to the resolved branch instruction. 
The methodology for generating the values of lbht_write_data, gbht_write_data and 
gsel_write_data will be described in detail in conjunction with FIGURE 5. 

' The corresponding entry in the respective one of LBHT 301, GBHT 302 and 
GSEL 303 is accessed using an address generated from the branch address, field 58a, 
.n the corresponding entry 58 in BIQ 56. The address into LBHT 301, which may be 
referred to as "lbht_write_addr," constitutes the m-bit branch address in field 58a. A 
number, n, of bits of lbht_write_addr are used to select the LBHT entry and the 
remaining, m-n, bits index the counters in the selected entry. Thus, 2^-> = p. Note 
that the m-bit branch address may be a portion of the full address of a branch 
instruction. The address for accessing GBHT 302 and GSEL 303, which may be 
referred to as "gbht_write_addr," is generated by XORing n bits of the branch address 
from BIQ field 58a with the GHV value in BIQ field 58c corresponding to the branch 
instruction for which the history table entries are being updated. The resulting n bit 
value IS concatenated with the remaining m-n bits of the branch address in field 58a to 
form the m-bit value of gbht_write_addr. The n-b.t portion addresses the entry of the 
GBHT and the (m-n)-bit portion indexes the counters in the entry. In an embodiment 
of the present invention «..may be fourteen and n may be eleven. The methodology 
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1, fprnowtoFlGlJ^^'* rWV register 306, fi'J 

ourrenlW''f°«^' qHVO logic 401 ^^^upledtoan 

H"*'"''ic41.M*- 
..»elo„eofseve«lJ* 

Dependmg on *e qhV may 

rHV mav need to be recovered. „ hold 

of*eGHVm« ,„gio 404 and GH« ' 6 „^ GHV2 

„.p,eV,ou3..a«. ^^,,,es.respecuve>v. GHV' > ^ 

,„,ic 406 »a, also b" «^ ,„„o 406 * 8 

---■""^llnl^-po^on^--^^^^^^^ 
taAtoaninP-'of*"" J ^,402. Slm,larW.>»' ,„f oHVUo8*c404 

.,„.hefertSro»ph-^-'"=' 
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„ has also been assumed ftat i. takes three cydes to de.en.ine ,f there is a cache 
„,ss there are more cycles to determine either a Uken prediction, or a cache mrss, 
addittona, GHVx logic stages denoted genencaUy by GHVx, may be incorporated, as 
„ou,d be recognrzed by an artisan of ordinary skil, in the art with taiten-pred.cuon 
parh 408 and cache mrss path 4,0 tapped off the corresponding GHV. ,og,c stage. 

AS discussed above, the first input to GHVO logrc 402 is provided by an 
output MUX 4,4, ML« 4,4 is a five-way mu,Up,exer. the rnput of which prov.d. a 
vaiue of the GHV in accordance with each of a set of actrons that may cause the GHV 
.„ he modified, MUX 4,4 sCects for the signa, on one of the five inputs m response 
,0 GHV select ,ogic 4,6. In an embodiment of the present inventron GHV ,og,c 
may be provided with a two-phase clock signal (not shown in FIGURE 4 for 
simpiicity), Selea logic 416 renders selects 409, 4,1. 413 active on a first 
predetermined phase of the two-phase Cock and the register portions of GHVO 
logrc 402. GHV, ,ogic 404 and GHV2 logic 406 latch data on a second 
predeterurmed phase of «>e two-phase clock. The logic states associated with each of 
*e inputs, denoted cache miss path 408. taken-predictron path 410. hold path 412. 
misprediction pad, 420 and advance path 426 wrll now be described. The operat,o„ 
of GHV logic 31 , in rendenng eachof the inputs active wi„ be described in detarl m 
conjunction wid, FIGURES 8A and 8B where each of the inputs corresponds to a 
sequence of steps (-paths') within the methodology of FIGURES 8A and 8B. 

Hold path 412 is active if the processor holds the instruction prpelme. 
Processors, such as CPU , 1 in FIGURE 1. may hold a pipelme if. for example, an 
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selects for outputting the signal on hold path 412, which is provided by the output of 
GHVO logic 402. Likewise, GHVO select 413 assumes the logic state for selecting 
the feedback input from the output of GHVO logic 402. 

GHV select logic 416 selects taken-prediction path 410 as the active path if a 
branch in the fetch group is predicted taken. When taken-prediction path 410 is 
active, the value of the GHV output by GHV 1 logic 404 is left shifted with the value 
"1" shifted into the least significant bit (LSB). GHVO select 413 assumes a logic state 
selecting for the input to GHVO logic 402 coupled to the output of MUX 414. 
Similarly, GHVl select 41 1 and GHV2 select 409 assume the logic states for 
selecting the inputs to the respective one of GHVl logic 404 and GHV2 logic 406 
coupled to the output of GHV logic 402 and the output of GHVl logic 404. 

In the event of a cache miss, cache miss path 408 becomes active in response 
to GHV select logic 416. GHV2 select 409 assumes the logic state for selecting the 
feedback input into GHV2 logic 406, and each of GHVO select 413 and GHVl select 
41 1 assume the logic state for selecting the input of GHVO logic 402 coupled to MUX 
414 and the input of GHVl logic 404 coupled to the output of GHV logic 402, 
respectively. 

If, on resolution, a branch is mispredicted, misprediction path 420 becomes 
the active path for MUX 414. Each of GHVO select 413, GHVl select 411 and 
20 GHV2 select 409 assumes the logic state for selecting input coupling each of logic 

402, 404 and 406 to the output of its preceding one of MUX 414, logic 402 and logic 
404, such that the value of the GHV on misprediction path 420 is loaded into the 
register portion of each of GHVO logic 402, GHVl logic 404, and GHV2 logic 406. 



15 
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The value of the GHV on misprediction path 420 depends on the resolution of 
the branch, and the position of the mispredicted branch instmction in the fetch group. 
Misprediction selection logic 422 selects for one of three inputs in three-way MUX 
424. If the actual outcome is resolved as not taken, and the resolved branch is not the 
5 last instruction in the fetch group, the value of the GHV in BIQ field 58c, 

FIGURE 3B is selected. If, however, the resolved branch is the last instruction m the 
corresponding fetch group, then the GHV from BIQ field 58c is left shifted with "0" 
entered in the LSB thereof. If the mispredicted branch was resolved as taken, MUX 
424 selects for the value of the GHV in BIQ field 58c left shifted with "1" loaded in 

10 theLSB. 

If there is no instruction fetch redirection from one or more branch 
instructions in the fetch group, or a cache miss, or from an instruction pipeline hold, 
then the instruction pipeline can advance to the next sequential address from the last 
instruction in the current fetch group. Advance path 426 is then active, and the 
15 current GHV is left shifted with "0" loaded in the LSB. GHVl select 41 1 selects for 

the input in GHVl logic 404 coupled to the output of GHV logic 402 and GHV2 
select 409 selects for the input in GHV2 logic 406 coupled to the output of GHVl 
logic 404. Likewise, GHVO select 413 selects for the output of MUX 414. 

The methodologies for generating branch predictions and updating the branch 
history tables and updating the GHV in accordance with the principles of the present 
invention may now be described. These methodologies may be executed by the 
branch prediction circuitry 300 in FIGURE 3 A. Refer now to FIGURE 5, illustrating 
methodology 500 for generating branch predictions and storing branch information in 



20 
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the BIQ. In step 502, the entry in the LBHT pointed to by the value in the IFAR is 
read. The addressing of entries in LBHT via the IFAR has been discussed 
hereinabove in conjunction with FIGURE 3A. In step 504, the GBHT and GSEL 
entries pointed to by the gbht_read_addr, as also described hereinabove in 

5 conjunction with FIGURE 3A, is read. In step 506, the LBHT data is selected in 

response to the value in the corresponding entry in GSEL 303. The branch history 
table data, either the LBHT data or the GBHT data, may be selected via MUX 308 in 
response to the GSEL data. For example, as described above, in an embodiment of 
the present invention, a Logic 0 in the corresponding entry in GSEL 303 selects the 

10 output from LBHT 301 and a Logic 1 selects the output from GBHT 302. In an 

embodiment of the present invention in accordance with apparatus 300 of 
FIGURE 3A, the LBHT data, the GBHT data, and the GSEL data may be held in 
respective data registers, for example, LBHT data register 320, GBHT data 
register 322, and GSEL data register 324. The selected data is stored in a prediction 

1 5 register, such as prediction register 3 10, in step 508. 

If, in step 510, it is determined that the fetch group contains an unconditional 
branch, or in response to the predictions in step 508, a conditional branch is predicted 
taken, then in step 512 a taken-prediction fetch redirection is generated. This 
redirection may signal GHV select logic 416, FIGURE 4, to select taken-prediction 

20 path 410 as the active path. Then, in step 514, for each branch in the fetch group, a 

BIQ entry is allocated and the branch information is stored in the corresponding 
fields, such as fields 58a-58e of entry 58, FIGURE 2B. 
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Kefe.„S now U, P.GUKE 6, «,e,e is U,e.,„ ' 

„„„ng .0 «,e branch Kisto. .b,e e„«es .„ — w,* » e»bod„nen o^U,e 
; esen i„ve„.,oa .„ step 60., i. is ae.e™,ne. ,f *e ..nch .„.u.c.,o„ has . 
L..„acu™.processo,c,c,e..no,™e.hodo,osy600.a..soneovc.,. 

Lp604 ahe™,se,ifab,anchhashee„reso,ved.i„s.ep506*eb,a„ch 

f lation is obuinedfton, tt,e o.rrespo„di.g B,Q e„»y. In step 60S, *e value of 
;:rr I s.Uo™bi.sof*eb.a„ohad*ess.o.U,eB,Q,ro.exa™p,e^^^^^^ 

!i.0.heva,ueofsbh.^™n.e-.isset.c.heXOKof«hi.of.hehra„^^^^^ 
:::sana.hev..of.e„.b,.OHV.o™.eB,.e..,ro.e..^^^^^^^^^^ 

5SC of BIQ e«y 5S in FIGUKE 2B. The — g porUon of gbh._w ..e_ad* 
eons«.U„g.he — .-»bi.of,be.-b,.bra„chadd,essfron,*e 

„d,ngB:Qe„«vr,e.d,asp.ev.ous,,desc.bedhe— e.^ 
„i«,FIGURE3A, isse.,ns.ep6,2. Recall « in an e«bod,n.en. of Represent 
invention m may be fourteen and n may be eleven, 

.step 6^6, it is detetmtned .f the branch predtCion Is the actual outcorne, . 
„„., in stepaisabranchmispredicaonred^trs generated, .respo— 

an embodtmem of the present invention in accordance with apparatus 300 rn 
P,GURE3^0HVsele«logic416,nGURB4,andOHVlog,c3nmayseect 

,.,red,ct,onpath420asthe active path. -''j;;;^^ , 

^Jpredrctton s.gnal provide, to misprediction selectron logrc 422, nOV^^^ 
J,ted..f,however,instep616thepred,c.,onandactualoutcomearethesame. 

Step 618 is bypassed. 
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m step 620, k is determined if the prediction from the LBHT is correct, and 
the prediction from the GBHT incorrect. If so, in step 622, the value of 
gsel write data is set to "0." Otherwise, in step 620, the W branch is followed and 
.n step 624 it is detennined if the GBHT is correct and the LBHT prediction incorrect. 
If so .n step 626, thevalueofgsel_write_dataissetto"l." In an embodiment of the 
present invention, in accordance with methodology 600 in which, in step 622 the 
value of gsel write data is set to "0" and the value, in step 626, of gsel_write_data is 
set to "1," MUXs 308 select data from LBHT data register 320 in response to a logic 
state of "0" and select for data from GBHT data register 322 in response to a logic 
state of " 1 However, an artisan of ordinary skill in the art would recognize that a 
complementary embodiment MUXs 308 may be used in which embodiment a value 
of " 1 " would be written in step 622, and a value of "0" would be written in step 626 of 
methodology 600. It would be further understood by an artisan of ordinary skill in 
the art that such an alternative embodiment would be in the spirit and scope of the 

15 present invention. 

After setting the value of gsel_write_data in either step 622 or 626, the value 
of gsel write data is written to the entiy in GSEL 303 pointed to by the value of 
gbht w'rite addr, step 628. If, however, in step 624, the GBHT prediction is incorrect 
or the LBHT prediction is correct, that is, the LBHT and GBHT predictions were both 
correct or both incorrect, wherein step 620 takes the "No" branch, steps 622, 626 and 
628 are bypassed, and the corresponding entry in GSEL 303 is unchanged. 

Next, the entries in the LBHT and GBHT are updated. In step 630, it is 
determined if the branch resolved as taken. If not, lbht_write_data is set to "0" and 
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remaining m - n bits of the w-bit branch address from the corresponding BIQ entry 
field, as previously described hereinabove in conjunction with FIGURE 3A, is set in 
step 712. Recall that in an embodiment of the present invention, m may be fourteen 
and n may be eleven. 

5 In step 7 1 6, it is determined if the branch prediction is the actual outcome. If 

not, in step 718 a branch misprediction redirect is generated. In response, in an 
embodiment of the present invention in accordance with mechanism 300 in FIGURE 
3A, GHV select logic 416, FIGURE 4, and GHV logic 3 1 1 may select misprediction 
path 420 as the active path. Additionally, in step 718 a branch misprediction signal 
1 0 provided misprediction selection logic 422, FIGURE 4, is asserted. If, however, in 

step 716 the prediction and actual outcome are the same, step 718 is bypassed. 

In step 720, it is determined if the prediction from the LBHT prediction is 
correct. If so, in step 722, the value of gsel_write_data is set to "0." Otherwise, in 
step 720, the "No" branch is followed and in step 724 it is determined if the GBHT 
15 prediction is correct. If so, instep 726, thevalueof gsel_write_datais setto "1." In 

an embodiment of the present invention, in accordance with methodology 700 in 
which, in step 722 the value of gsel_write_data is set to "0" and the value, in step 726, 
of gsel_write_data is set to "1," MUXs 308 select data from LBHT data register 320 
in response to a logic state of "0" and select for data from GBHT data register 322 in 
20 response to a logic state of " 1 . " However, an artisan of ordinary skill in the art would 

recognize that a complementary embodiment MUXs 308 may be used in which 
embodiment a value of " 1 " would be written in step 722, and a value of "0" would be 
written in step 726 of methodology 700. It would be further understood by an artisan 
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of ordinary skill in the art that such an alternative embodiment would be in the spirit 
and scope of the present invention. After setting the value of gseLwrite_data in 
either step 722 or 726, the value of gsel_write_data is written to the entry in GSEL 
303 pointed to by the value of gbht_write_addr, step 728. If, however, in step 724, 
the GBHT prediction is incorrect or the LBHT prediction is correct, that is, both the 
LBHT and GBHT predictions were incorrect by virtue of step 720, the corresponding 
entry in GSEL 303 is unchanged, and step 722, 726 and 728 are bypassed. 

Next, the entries in the LBHT and GBHT are updated. In step 730, it is 
determined ifthe branch resolved as taken. If not, lbht_write_data is set to "0" and 
written to the LBHT at the entry pointed to by lbht_write_addr, step 732. Similarly, 
the value of gbht_write_data is set to "0" and written to the GBHT entry pointed to by 
gbht_write_addr, step 734. Methodology 700 then returns to step 702. If, however, 
in step 730, ifthe branch was resolved as taken, then, in step 736, the value of 
lbht_write_data is set to " 1 " and written to the LBHT at the address pointed to by 
lbht^write_addr. Likewise, in step 73 8, the value of gbht_write_data is set to " 1 " and 
written to the GBHT at the entry pointed to by gbht_write_addr, and methodology 
700 returns to step 702. Recall, however, as discussed in conjunction with 
FIGURE 6, complementary values of lbht_write_data and gbht_write_data may be 
used in steps 732, 736, and 734 and 738, respectively. 

Additional alternative embodiments of methodology 600 may be implemented 
in which one GSEL, such as GSEL 303, FIGURE 3A, may be updated when both the 
LBHT and GBHT are correct or when neither is correct. These alternative 
embodiments are summarized in Table 1, wherein case 1 and case 2 correspond to 
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steps 620-628 and steps 720-728 in FIGURES 6 and 7. respectively. An artisan of 
ordinary skill would understand the adaptations of steps 620-628 or, alternatively, 
steps 720-728 to implement the alternative embodiments in cases 3-9, Table 1, and 
would recognize that these alternative embodiments are within the spirit and scope of 



the present invention. 



Table 1 



25 



LBHT prediction correct 


GBHT prediction correct 


GSEL update 


1. Yes 


Yes 


No change 


No 


No 


No change 


2. Yes 


Yes 


write 0 


No 


No 


No change 


3. Yes 


Yes 


No change 


No 


No 


write 0 


4. Yes 


Yes 


No change 


No 


No 


write 1 


5. Yes 


Yes 


write 0 


No 


No 


write 0 


6. Yes 


Yes 


write 0 


No 


No 


write 1 


7. Yes 


Yes 


write 1 


No 


No 


No change 


8. Yes 


Yes 


write 1 


No 


No 


write 0 


9. Yes 


Yes 


write 1 


No 


No 


write 1 
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The values of the GHV may be updated in accordance with methodology 800, 
illustrated in FIGURES 8 A and 8B. In step 802, the IFAR is set to the address of the 
current instruction, and in step 804 the next group of instructions is fetched. 
5 It is then determined, in step 806, whether an instruction fetch redirection has 

been generated. An instruction fetch redirection may be generated by the processor 
asserting a pipeline hold, as previously discussed hereinabove in conjunction with 
FIGURE 4, or a cache miss. Additionally, a branch misprediction may generate an 
instruction fetch redirection as, for example, in step 618 of methodology 600 in 

10 FIGURE 6. 

If, in step 806, an instruction fetch redirection has not occurred, the IFAR is 
set to the next sequential instruction, step 808, and in step 810 the value in the register 
portion of GHVl logic 402, FIGURE 4, denoted GHVl, is copied to the register 
portion of GHV2 logic 406 and the value, denoted GHVO, in the register portion of 

1 5 GHVO logic 402 is loaded in the register portion of GHV 1 logic 404 to become a new 

value of GHVl . Additionally, the value of GHVO in the register portion of GHVO 
logic 402 is left-shifted and the value "0" loaded in the LSB thereof, and latched into 
the register portion of GHVO logic 402. In an embodiment of the present invention, 
with GHV logic 3 1 1 in accordance with FIGURE 4, step 810 may be performed using 

20 GHV select logic 4 1 6, whereby step 8 1 0 effects advanced path 426. Methodology 

800 then returns to step 804. 

If, however, in step 806, an instruction fetch redirection has been received, it 
is determined if the redirection is a branch-taken prediction. In an embodiment of the 
present invention in accordance with methodology 500, FIGURE 5, the branch-taken 
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fetch redirection may be generated as described in step 512, FIGURE 5. If, in 
step 812 the fetch redirection is a taken prediction, a "1" is left-shifted into the value 
obtained from the register portion of GHVl logic 404, denoted GHVl, and copied 
into the register portions of GHVO logic 402, GHVl logic 404 and GHV2 logic 406. 
5 In an embodiment of the present invention in accordance with the GHV logic 3 1 1 of 

FIGURE 4, step 814 may be effected by GHV select logic 416 rendering 
taken-prediction path 410 active. Also in step 814, the IFAR is set to the branch 
target address of the branch predicted taken. Methodology 800 then returns to 
step 804. 

1 0 If, however, in step 8 1 2, a taken-prediction redirection has not been effected, 

in step 816, it is determined if a cache miss has occurred. If so, in step 818, the value, 
in the register portion of GHV2 logic 406, denoted GHV2, is loaded into the register 
portions of GHVO logic 402 and GHVl logic 404. In an embodiment of the present 
invention in accordance with the GHV logic of FIGURE 4, step 820 may be 

1 5 performed by GHV select logic 4 1 6 effecting cache miss path 408 via MUX 4 1 4, and 

GHVO select 413 selecting for the input of GHV logic 402 coupled to the output of 
MUX 414, GHVl select 41 1 selecting for the input of GHVl logic 404 coupled to the 
output of GHVO logic 402 and GHV2 select 409 selecting for the feedback input of 
GHV2 logic 406. Then, in step 820, the IFAR is backed up by three fetch cycles. 

20 That is, a state of the IFAR is returned to the state of the IFAR three fetch cycles 

earlier than the current cycle. It is assumed, in the embodiment of the present 
invention in accordance with methodology 800, that it takes three fetch cycles to 
detect a cache miss. In a processor in which detection of a cache miss takes more 
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„™.e,ofcydes.Aad,.,o„aUy.aa.Wo„a,s.agesofGHV..og.cwou>d.e 
i„c„rpora.e<. in GHV .og,c 3,0, FIGUKE 4, as discussed here,„above. After *e 

.aCed up in step S.O, n,e*odo,ogv SCO ioops un,i, .e i„s_ ..m 
f„„ a iower leve, of a « hierarchy or a fe.h redi^aion ,s recerved, s.p 
Thereafter, methodology continues to step 804. 

,f however, in step 816 the ,ns.™ct,on fetch redirection is no. a cache mts , 

,s de.erm'u,ed in step m. if the redirection is a ptpeUne hoid. If so in step S24, the 

vaiues of GHVO rn the register portion of GHVO logic 402. GHVl .n the reg,s.r 
p;:onofG„VUogic404,andGHV2.n.heregisterportron„fOHV2,og,c4 are 

e,d U, the enrbodrmen. of GHV logic 3 ,0 of FIGmE 4, GHV select ,og,c 4.6 
effects the hold path 412 ,n perfomring step 824. Methodology 800 then reUtms to 

, in step 822, the redirection is no. a pipeUne hold, then the instruction «ch 

,edirect,on, determrned t„ step 806 is a branch mispredicUon, and -"e<^° -""By - 
proceeds to step 828, FIGURE 8B, In step 828; the br^ch infomtatron fron, O^e BIQ 
e„„ for the nrispredicted branch is obtained In step 830, ,. is dcemnned ,f the 
Jch resolved as tal.en. If no, rn step 832, ,t rs then determined if the po.t.on o 
,,e nrispredicted br^ch in the fetch group is at *e end of the fetch group. If not, n 
.ep 8,4 the valueof the GHV obtained from the BIQ in s.p 828 is loaded rnto *e 
::;terpor.onsofGHV01ogic402,G„Vllogic404,a„daHV21o.c4„6as.e 

value of, respeCively, GHVO, OHVl, and GHV2. In an embodimen.of GHV 
logic 3,0 in accordance wi.h FIGURE 4. step 834 may be performed by 
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„ispred,ouon selection logic 422 selecnng for inpu. 426 of MUX 424, in response ,o 
a branch outcome signal and branch pos.tion signal. The branch pos.tion signal may 
be received from BXU field 58e. FIGURE 3B in accordance with the branch entry 
illustrated therein. The branch outcome signal may be received from BXU 53 m 
branch processing unit 54, FIGURE 2A. The branch outcome srgnal may have a frrst 
predetermmed value indtcaUng that the branch resolved as taken and a second 
predetermined value indicating that the branch resolved as not taken. Also, ,n 
response to the misprediction, GHV select logic 416 selects for mrspred.ctton 
path 420 and GHVO select 4,3, GHVl select 4,1 and GHV2 select 409 select for Ute 
respecfve tnputs of GHVO logic 402, GHVl ,ogtc404 and GHV2 logic 406 coupled 
to the corresponding one of the output of MUX 4,4 and the outputs of GHVO 

logic 402 and GHV, logic 404. 

Additionally in step 834, the IFAR is set to the next sequential address of the 

branch address received in step 828. 

I„ step 836 all instmctions after the mispredicted branch are discarded, and m 
step 838 the branch history tables are updated. Branch history tables may be updated, 
in step 838 in accordance with methodology 600. FIGURE 6, or, in an altemafve 
embodiment of the present invention, methodology 700, FIGURE 7. 
Methodology 800 then returns to step 804, FIGURE 8A. 

Returning now to step 832, if the mispredicted bmnch is at the end of the fetch 
group then, step 832 proceeds by the -Yes" branch to step 840. In step 840, a "O" ,s 
left-shifted into the GHV received fixim the BIQ in step 828 and the result loaded into 
GHVO GHV, andGHV2. Step 840 may be performed in the embodimeM of GHV 
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logic 310 of FIGURE 4 by misprediction selection logic 422 selecting for input 430 
in MUX 424 Additionally, GHV select logic 416 selects from its prediction 
path 420, as previously described. Additionally, in step 840, the IFAR is set to the 
next sequential address of the branch address as received in step 828. 

5 Methodology 800 then proceeds to step 836, previously discussed. 

Returning to step 830, if the mispredicted branch resolved taken, step 830 
proceeds by the "Yes" branch to step 842. In step 842, a " 1" is left-shifted into the 
GHV received from the BIQ in step 828, and the results loaded into GHVO, GHVl, 
and GHV2. Step 842 may be performed, in the embodiment of GHV logic 3 10 of 

10 ' FIGURE 4 by misprediction selection logic 422 selecting for input 428 of MUX 434. 

Additionally, GHV select logic 416 selects from misprediction path 420 as previously 
discussed. Also, in step 842, the IFAR is set to the branch target address of the 
mispredicted branch instruction. Methodology 800 then proceeds to step 836 as 
previously described. 

1 5 In this way, branch prediction based on a prediction history is implemented 

having a constant amount of processing. According to the principles of the present 
invention, one bit is shifted into the global history vector for each fetch group. The 
loading of a bit, "one" or a "zero," in the global history vector essentially captures the 
path the program has taken to reach the branch instruction being predicted, and 

20 thereby provides an indication of how the branch will behave (taken or not-taken). 

The following example illustrates the branch prediction mechanism in 
accordance with the present invention. Consider a sequence of instructions, which 
may be written in PowerPC™ assembly language as: 
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(1) LBOO 


addic 


GO, G2, 0 


(2) LBOl 


cmp 


00, 0, GO, 01 


(3) 


be 


OC, 02, <LBOO> 


(4) 


nop 




(5) 


nop 




(6) 


nop 




(7) 


nop 




(8) 


nop 




(9) 


addic 


G0,G1,0 


(10) 


add 


G4, G4, Gl 


(11) 


cmp 


00, 0, G4, G5 


(12) 


be 


04, 02, <LB01> 



/* Mispredicts first 5 times only */ 



(PowerPC™ is a trademark of International Business Machines Corp.) Although the 
1 5 code snippet above is written in PowerPC™ assembly, it would be understood by an 

artisan of ordinary skill in the art that the invention is not limited to the PowerPC™ 
processor, and, in particular, a similar sequence of operations could be written in an 
assembly language corresponding to other microprocessor systems. In the above, GO, 
G1,G2, G4, and G5 correspond to five general purpose registers. These are 
20 initialized with the exemplary values 0, 1, 2, 0, and 32, respectively. The operations 

performed by the above example include two branches, the instructions having the 
mnemonic "be". The "nop" (no operation) instructions are introduced to pad out a 
fetch group corresponding to an embodiment of the present invention in which a fetch 
group includes eight instructions. If the first branch is not taken, then the next 
25 instruction executed is in second fetch group which starts with the ninth instruction 

above, that is, the second "addic" instruction. 
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The first instruction moves the value in the register G2 into the register GO. The 
second instruction, denoted by the mnemonic "cmp" compares the value in the 
register GO with the value in the register GI. In response to the comparison of the 
contents of the register operands, the "cmp" instruction sets a bit in a selected field, in 
this case, field 0, in a condition register. If the content of register GO is larger than 
the content of register GJ, a first one of the plurality of bits in the selected field is set. 
If the contents are equal, a second one of the plurality of bits is set, and if the contents 
of register G7 exceed the contents of register GO, a third one of the plurality of bits in 
the selected field of the condition register is set. Instruction (3), the first branch, acts 
in response to the second bit in the selected field of the condition register. If the 
second bit is set, the branch is taken, otherwise, the branch is not taken and the 
sequential path is followed. In the above, the first branch instruction toggles, that is, 
changes direction each time it executes, making prediction difficult. 

Thus, the first time the first branch instruction, the third instruction above, 
executes, the value in register Gl is 1, the initial value, and the value in register GO is 
two from the previous "addic" instruction, instruction (1). Thus, the "cmp" 
instruction sets the first bit in the selected field, and instruction (3), the first branch 
instruction, is not taken, and the sequential path is followed, fetching the next fetch 
group, which begins with instruction (9), the second "addic" instruction. 

Instructions (9)-(l 1) constitute a counter that counts up to the value of the 
contents of register G5, and the second branch, instruction (12), branches to 
instruction (2) with label "LBOl ". On returning to instruction (2), the contents of 
registers GO and Gl are equal by virtue of the second "addic" instruction. 
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instruction (9), which moves the contents of GJ to register GO. Because the contents 
of these registers are equal, the first "cmp" instruction sets the second bit in the 
selected field of the condition register, and the first "be" branch instruction, 
instruction (3), is taken, whereby the flow returns to instruction (1) with label 

5 "LBOO" . Thus, in each of the iterations through the loop generated by the second 

fetch group, instructions (9)-(12), the first branch instruction, instruction (3), is 
executed twice and the direction toggles. In total the first branch instruction, 
instruction (3), is executed sixty-three times in the instant example in which the 
contents of register G5 equals thirty-two. 

1 0 After an initial five mispredicts for the first branch, instruction (3), the path 

history becomes a repetition of the pattern "Oil". (The initial value of register G5 of 
thirty-two is sufficient to ensure that the global history vector settles to a steady state 
value. However, an artisan of ordinary skill would understand that other exemplary 
values could have been chosen. At any particular fetch of the first fetch group which 

1 5 includes the first branch, instruction (3), there are two possibilities for the global 

history vector, in an embodiment of the present invention in which the global history 
vector includes eleven bits. The history vector may either be " 1101 101 101 1" or 
"01101101101". 



invention. In the first case, the prediction mechanism will predict not-taken because 
the mechanism determines that the next value to be shifted into the history vector is 
"zero". Similarly, in the second case, the mechanism of the present invention predicts 



20 



In the two possible sequences of the global history vector, the prediction of the 
first branch is perfectly predictable in accordance with the principles of the present 
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"taken" because the mechanism determines that the next value to be shifted into the 
history vector is "one". In other words, the prediction mechanism in accordance with 
the principles of the present invention recognizes the pattern repetition in the history 
vector. 

5 In sum, the present inventive concepts combine a local branch prediction 

mechanism with a fetch-based branch prediction mechanism while only requiring 
three tables. The selector table tracks the prediction performance of the fetch-based 
and bimodal branch history tables. Advantageously, the branch history table 
providing the better prediction performance can then be used to predict whether a 

10 branch instruction will be taken or not taken through the instruction pipelines. 

Although the present invention and its advantages have been described in detail, 
it should be understood that various changes, substitutions and aherations can be 
made herein without departing from the spirit and scope of the invention as defined 
by the appended claims. 
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